<html>
<head><meta charset="utf-8"><title>perf/dtrace experiences and user stories · turbowish · Zulip Chat Archive</title></head>
<h2>Stream: <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/index.html">turbowish</a></h2>
<h3>Topic: <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories.html">perf/dtrace experiences and user stories</a></h3>

<hr>

<base href="https://rust-lang.zulipchat.com">

<head><link href="https://rust-lang.github.io/zulip_archive/style.css" rel="stylesheet"></head>

<a name="234231588"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/284843-turbowish/topic/perf/dtrace%20experiences%20and%20user%20stories/near/234231588" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> pnkfelix <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories.html#234231588">(Apr 12 2021 at 21:08)</a>:</h4>
<p>I want to gather user stories, potentially on this topic</p>



<a name="234233701"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/284843-turbowish/topic/perf/dtrace%20experiences%20and%20user%20stories/near/234233701" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> pnkfelix <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories.html#234233701">(Apr 12 2021 at 21:24)</a>:</h4>
<p>context: <span class="user-mention" data-user-id="119031">@Esteban Küber</span> made a <a href="https://twitter.com/ekuber/status/1381716376553811969?s=20">tweet</a></p>
<div class="inline-preview-twitter"><div class="twitter-tweet"><a href="https://twitter.com/ekuber/status/1381716376553811969?s=20"><img class="twitter-avatar" src="https://pbs.twimg.com/profile_images/1212821966467420161/Jymc4Y_T_normal.jpg"></a><p>Are you a perf or dtrace user? Have you used either to gain insight or solve a problem on a program written in <a href="https://twitter.com/rustlang">@rustlang</a>, C or another language? <a href="https://twitter.com/pnkfelix">@pnkfelix</a> and I want to hear about your use cases and workflow. Reach out to either of us if you would like to talk.</p><span>- Esteban K – 🦀⚙️@home (@ekuber)</span></div></div>



<a name="234235054"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/284843-turbowish/topic/perf/dtrace%20experiences%20and%20user%20stories/near/234235054" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> nagisa <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories.html#234235054">(Apr 12 2021 at 21:36)</a>:</h4>
<p>I felt my laptop hot, noticed mpv consuming 70% of CPU time for no apparent reason… one <code>perf record</code> later I realised hardware decoding was not working.</p>



<a name="234235113"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/284843-turbowish/topic/perf/dtrace%20experiences%20and%20user%20stories/near/234235113" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> nagisa <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories.html#234235113">(Apr 12 2021 at 21:37)</a>:</h4>
<p>On the other hand I tried to do the same with firefox many times, never with much success.</p>



<a name="234235262"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/284843-turbowish/topic/perf/dtrace%20experiences%20and%20user%20stories/near/234235262" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> nagisa <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories.html#234235262">(Apr 12 2021 at 21:38)</a>:</h4>
<p>Using <code>perf</code> with C codebases feels _much_ better than with Rust or C++ ones.</p>



<a name="234238874"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/284843-turbowish/topic/perf/dtrace%20experiences%20and%20user%20stories/near/234238874" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> iximeow <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories.html#234238874">(Apr 12 2021 at 22:15)</a>:</h4>
<p>hi i'm here because of <span class="user-mention" data-user-id="119031">@Esteban Küber</span>'s tweet too <span aria-label="wave" class="emoji emoji-1f44b" role="img" title="wave">:wave:</span></p>
<p>i've used perf to profile rust code somewhat carefully, and tweeted about it a smidge: <a href="https://twitter.com/iximeow/status/1373735782620438528">https://twitter.com/iximeow/status/1373735782620438528</a> - i've run into two major issues (kind of touched on in the other <code>project-turbowish</code>stream). first, with several levels of inlining, it's hard to tell what code is from what functions, and what changes i'd even want to make. with _large_ functions it can be very hard to follow where a value even comes from, or where it would end up used. i think these are more perf-oriented problems though -there isn't a way to look at a slice of a program by value, and there's no way to ask "where did this value come from".</p>
<div class="inline-preview-twitter"><div class="twitter-tweet"><a href="https://twitter.com/iximeow/status/1373735782620438528"><img class="twitter-avatar" src="https://pbs.twimg.com/profile_images/1315507426217476096/VYMLImHY_normal.png"></a><p><a href="https://twitter.com/stephentyrone">@stephentyrone</a> <a href="https://twitter.com/pkhuong">@pkhuong</a> <a href="https://twitter.com/johnregehr">@johnregehr</a> i'm surprised by the answer! sequences like

mov byte [rsp + 0x81], r10b
mov byte [rsp + 0x80], r9b
mov byte [rsp + 0x82], r8b

i also see no writes to rsp + 0x83?</p><span>- iximeow (@iximeow)</span></div></div><p>the other contributing factor in my case is trying to diagnose pipeline stalls, which perf is questionable for in the first place. but compounded again by heavy inlining because it can be hard to map the data flow in the compiled program to whatever logic i fed to rustc in the first place</p>



<a name="234239068"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/284843-turbowish/topic/perf/dtrace%20experiences%20and%20user%20stories/near/234239068" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> iximeow <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories.html#234239068">(Apr 12 2021 at 22:18)</a>:</h4>
<p>when looking more at async-heavy services, generally things have worked fine. our hotspots have been something being surprisingly blocking, which is a bit different than what perf is good at pointing me at. long function/symbol names are unwieldy but again i think that's more a <code>perf</code> issue than something rust can fix</p>



<a name="234239962"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/284843-turbowish/topic/perf/dtrace%20experiences%20and%20user%20stories/near/234239962" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> iximeow <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories.html#234239962">(Apr 12 2021 at 22:27)</a>:</h4>
<p>one related note, when looking at lock contention (ultimately related to the "surprisingly blocking" debugging earlier) we've found ebpf scripts like <a href="https://github.com/prathyushpv/klockstat/blob/master/lockstat.py">https://github.com/prathyushpv/klockstat/blob/master/lockstat.py</a> to be _extremely_ useful. again, not sure what to do with that from a user-facing tool perspective. for one, i've got no idea where to even try inserting instrumentation if i wanted to try somewhere higher in the stack than syscalls. between name mangling and inlining it's hard to know in advance what i'd look for, and when to suspect weird monomorphization-related optimization vs an inner inlined method behaving weird in one codepath</p>



<a name="234240740"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/284843-turbowish/topic/perf/dtrace%20experiences%20and%20user%20stories/near/234240740" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Esteban Küber <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories.html#234240740">(Apr 12 2021 at 22:34)</a>:</h4>
<p><span class="user-mention silent" data-user-id="123586">nagisa</span> <a href="#narrow/stream/284843-project-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories/near/234235262">said</a>:</p>
<blockquote>
<p>Using <code>perf</code> with C codebases feels _much_ better than with Rust or C++ ones.</p>
</blockquote>
<p>Could you expand on that a little bit?</p>



<a name="234242157"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/284843-turbowish/topic/perf/dtrace%20experiences%20and%20user%20stories/near/234242157" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> nagisa <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories.html#234242157">(Apr 12 2021 at 22:52)</a>:</h4>
<p><span class="user-mention silent" data-user-id="119031">Esteban Küber</span> <a href="#narrow/stream/284843-project-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories/near/234240740">said</a>:</p>
<blockquote>
<p>Could you expand on that a little bit?</p>
</blockquote>
<p>Fair, sure. I think it is related to the fact that C codebases don't utilize monomorphization, first and foremost. When you see a symbol in perf, you almost definitely know where and what it is. It is also easy to grep for callers and whatnot. Also helps that the symbols are not mangled.</p>
<p>I don't know what it is about the C compilation model but it tends to be easy to just tell what's expensive in a perf report at a glance, both in a symbol tree overview as well as in the single function disassembly. Whereas for Rust or C++ projects I end up staring at the thing, trying various visualization methods, and still not really getting it.</p>
<p>I guess it also can mean that a typical Rust/C++ project don't have hotspots as bad as a typical C project, which seems likely enough given that C projects do tend to re-implement various data structures from scratch.</p>



<a name="234247944"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/284843-turbowish/topic/perf/dtrace%20experiences%20and%20user%20stories/near/234247944" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jed <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories.html#234247944">(Apr 12 2021 at 23:56)</a>:</h4>
<p>I'd echo the above, though I've spent more time profiling C. I tend to work on parallel (MPI and GPU-offload) numerical software and use <code>perf record</code> to make flamegraphs and traces (often via <a href="https://github.com/KDAB/hotspot/">hotspot</a>, <a href="http://profiler.firefox.com">profiler.firefox.com</a>, or speedscope). From there, I narrow down and often look at assembly annotated with source (works quite well in C) and check for vectorization and other poor codegen. Sometimes I'm looking for memory access patterns (say, from poor orderings in sparse matrices resulting in excessive cache misses).</p>



<a name="234323461"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/284843-turbowish/topic/perf/dtrace%20experiences%20and%20user%20stories/near/234323461" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> pnkfelix <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories.html#234323461">(Apr 13 2021 at 13:22)</a>:</h4>
<p><span class="user-mention silent" data-user-id="405045">iximeow</span> <a href="#narrow/stream/284843-project-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories/near/234239068">said</a>:</p>
<blockquote>
<p>when looking more at async-heavy services, generally things have worked fine. our hotspots have been something being surprisingly blocking, which is a bit different than what perf is good at pointing me at. long function/symbol names are unwieldy but again i think that's more a <code>perf</code> issue than something rust can fix</p>
</blockquote>
<p>Yes. My current proposed plan is to tackle the “surprsingly blocking” issues as a separate problem, with separate tooling. (See recently merged <a href="https://rust-lang.github.io/wg-async-foundations/vision/shiny_future/barbara_makes_a_wish.html">shiny future story</a>.)</p>



<a name="234325849"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/284843-turbowish/topic/perf/dtrace%20experiences%20and%20user%20stories/near/234325849" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> pnkfelix <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories.html#234325849">(Apr 13 2021 at 13:36)</a>:</h4>
<p>(So I put this up on twitter too, but I want to repeat it here since I figure the people who came to Zulip are willing to go the extra mile to help out):</p>
<blockquote>
<p>I'd specifically love to see short, uncurated screencasts (or other recordings such as asciinema) of programmers actively using either tool for real-world development (on C, or Rust, or other). Any length (10min, or an hour); I just want to see you operate while "in the zone."</p>
</blockquote>
<p><code>https://twitter.com/pnkfelix/status/1381963220810919936</code></p>



<a name="234348265"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/284843-turbowish/topic/perf/dtrace%20experiences%20and%20user%20stories/near/234348265" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> The 8472 <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories.html#234348265">(Apr 13 2021 at 15:28)</a>:</h4>
<blockquote>
<p>which is a bit different than what perf is good at pointing me at.</p>
</blockquote>
<p>Perf can be used to trace context switch/descheduling events too. If your blocking IO should be on a different thread pool you can filter those threads out and then focus on what gets the thread pool that should be CPU bound descheduled</p>



<a name="234830939"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/284843-turbowish/topic/perf/dtrace%20experiences%20and%20user%20stories/near/234830939" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> DPC <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories.html#234830939">(Apr 16 2021 at 10:37)</a>:</h4>
<p>i think it would be nice to have a way to do incremental performance checks - ways you can run tools, make changes and then see the delta instead of having to do it manually</p>



<a name="234853013"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/284843-turbowish/topic/perf/dtrace%20experiences%20and%20user%20stories/near/234853013" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Stu <a href="https://rust-lang.github.io/zulip_archive/stream/284843-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories.html#234853013">(Apr 16 2021 at 13:30)</a>:</h4>
<p><span class="user-mention silent" data-user-id="120823">DPC</span> <a href="#narrow/stream/284843-project-turbowish/topic/perf.2Fdtrace.20experiences.20and.20user.20stories/near/234830939">said</a>:</p>
<blockquote>
<p>i think it would be nice to have a way to do incremental performance checks - ways you can run tools, make changes and then see the delta instead of having to do it manually</p>
</blockquote>
<p>I had great success using <a href="https://github.com/plasma-umass/coz">coz</a> which does something like this internally, by slowing down or "speeding up" specific branches of your program</p>



<hr><p>Last updated: Aug 07 2021 at 22:04 UTC</p>
</html>